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Abstract 

The massively parallel computation of absolute binding free energy with a well-equilibrated 
system (MP-CAFEE) has been developed [H. Fujitani, Y. Tanida, M. Ito, G. Jayachandran, C. 
D. Snow, M. R. Shirts, E. J. Sorin, and V. S. Pande, J. Chem. Phys. 123, 084108 (2005)]. 
As an application, we perform the binding affinity calculations of six theophylline-related ligands 
with RNA aptamer. Basically, our method is applicable when using many compute nodes to 
accelerate simulations, thus a parallel computing system is also developed. To further reduce the 
computational cost, the adequate non-uniform intervals of coupling constant A, connecting two 
equilibrium states, namely bound and unbound, are determined. The absolute binding energies 
AG thus obtained have effective linear relation between the computed and experimental values. 
If the results of two other different methods are compared, thermodynamic integration (TI) and 
molecular mechanics Poisson-Boltzmann surface area (MM-PBSA) by the paper of Gouda et al [H. 
Gouda, I. D. Kuntz, D. A. Case, and P. A. Kollman, Biopolymers 68, 16 (2003)], the predictive 
accuracy of the relative values AAG is almost comparable to that of TI: the correlation coefficients 
(R) obtained are 0.99 (this work), 0.97 (TI), and 0.78 (MM-PBSA). On absolute binding energies 
meanwhile, a constant energy shift of ~ -7 kcal/mol against the experimental values is evident. To 
solve this problem, several presumable reasons are investigated. 
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I. INTRODUCTION 



To accurately compute the free energy difference between two thermal equilibrium states 
(AG) is of interest in computational science and of importance in terms of drug discovery l|. 
It can help us obtain a quantitative understanding of molecular complexes from atomic-scale, 
and also provide valuable information for use in structural refinement for drug design. Gen- 
erally speaking, to discuss chemical reaction, we have to determine the free energy change 
within so-called chemical accuracy (~ 1 kcal/mol), which is very challenging. Popularly, 
several free energy calculations have been carried out within the framework of the thermal 
equilibrium approach. For instance, free energy perturbation (FEP) [2j and thermodynamic 
integration (TI) [3] have been widely used to calculate the free energy change associated 
with the transformation of one ligand into another via thermodynamic cycles 

a a. Cer- 
tainly, the recent improvement in these methods and the increase in computer resources 
have enabled us^to calculate the absolute binding energy concerned with very small organic 
6. 



compounds 



3, 3, H, [3] - However, equilibrium approaches involving most rigorous technique, 
known as the double decoupling method, require long time simulation to maintain the re- 
versibility about the work associated with the decoupling process. As a more approximate 

n 

method, the molecular mechanics Poisson-Boltzmann surface area (MM-PBSA) analysis [1(| 
has been a popular method to obtain absolute binding free energies within a reasonable time. 
In this approach, explicit waters and mobile counter ions are treated by the continuum model 
after obtaining the molecular dynamics (MD) trajectory. Since the normal mode analysis 
is time-consuming, the contribution of solute entropy is often approximated based on the 
average of a few snapshots. On the other hand, in the nonequilibrium statistical approach, 
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ibrium work theorem that is valid in far from equilib- 
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181 ]. although accurate numerical estimates of the 



exponential average, based on a finite number of sampled works, are relatively difficult 



19]. 



Shirts et al. proposed the acceptance ratio (AR) method [20j, |2lJ, [22j, |23| to estimate the 
free energy difference and also calculated the binding free energies for eight FKBP12-ligand 
complexes 24| using the "folding@home" system. Furthermore, our previous work 25| sug- 
gested the efficiency of the use of general AMBER force field (GAFF) j26j for ligands to 
obtain the predictive binding estimates, and simultaneously indicated the importance of 
starting structures for simulations. In our method the use of many independent compute 
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nodes or a grid computing system is effective to accelerate simulations, whereupon we also 
developed a massively parallel computing system, "BioServer" , consisting of 1920 micropro- 
cessor elements, to efficiently perform the binding free energy calculations. Hereafter, our 
method is referred to as the massively parallel computation of absolute binding free energy 
with a well-equilibrated system (MP-CAFEE). 

The principal aim of this study is to explore the practical feasibility of our methodology, 
based on a nonequilibrium approach. We have performed the calculation of the absolute 
binding free energy for theophylline and its analogs to the RNA aptamer. On the same 
system, Gouda et al. reported the results of free energy calculations using two different 
methods, TI and MM-PBSA [27]. Therefore, this system is a good platform to compare 
the ability between two different methods, namely the equilibrium and nonequilibrium ap- 
proach. The paper is organized as follows. We initially describe the theoretical background 
and simulation protocol of binding free energy estimation used through this work, before 
subsequently explaining the choice of coupling constant A intervals connecting two equilib- 
rium states in Sec. II. The results for all RNA-ligands are presented and discussed in Sec. 
III. After pointing out the key features of this system, we conclude in Sec. IV. 



II. COMPUTATIONAL DETAILS 



In general, the methods used to calculate the free energy difference in molecular sim- 
ulations can be classified as either equilibrium or nonequilibrium approaches. Several ap- 
proaches based on thermal equilibrium, e.g. FEP, TI and the pot ential of mean force using 
weighted histogram analysis method with umbrella sampling 2{|, have been attempted to 
compute the free energy difference between two equilibrium states. However, the equilibrium 
approaches encounter the difficulty of removing the nonequilibrium contributions, known as 



the hysteresis problem 29_|, |30j, [3jJ . In order to overcome this difficulty, a long time simu- 



lation must be performed, to retain the reversibility of quasistatic process associating with 
the thermal equilibrium. Recently, Jarzynski's equality exactly relates the free energy dif- 
ference of two equilibrium states i and j to the statistics of works via a nonequilibrium, 
irreversible process that connects them as exp(— AG/k^T) = (exp(—W/k-BT)), where W is 
the work change of a system from i to j under isothermal isobaric (NPT) conditions |32j. A 
remarkable feature of Jarzynski's equality is the fact that the switching time between two 
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states is arbitrary. Subsequently, when the process occurs instantaneously, the work can be 
defined by the potential energy difference as W = £/(Aj,x) — £/(Ai,x), where U and x are 
the potential function and configurational coordinates, respectively. A coupling constant A 
is imposed on the path connecting two states. The fluctuation theorem (FT) related with 
Pi-+j(W) and Pj-+i(W) of the work probability distributions along the non-equilibrium for- 
ward (i — > j direction) and reverse (j — > i direction) processes is derived by Crooks [l3| as 
Pi^(+W) I 'P^i(-W) = exp[(W -AG)/k B T}. The ratio between P^(+W) and P^(-W) 
depends only the value of the free energy difference AG. Shirts et al. pointed out that 
the maximum likelihood AG is given by AR analysis, namely, the control parameter in 
AR analysis must correspond to the free energy difference 211 ] . Since two irreversible work 
distributions in forward and reverse directions cross at W— AG, both distributions must 
have a large overlap to obtain the free energy estimate with high accuracy. Subsequently, 
after applying the multi-staging method, the intermediate stages are inserted onto the path 
between two states % and j, with the total free energy change obtained as the sum of the 
differences. This type of usage, involving the insertion of many intermediate states on the 
path connecting two states, seems to be effective. However, since economical issues also play 
a role because many A require significant computing resources, we have now addressed the 
issue concerned with determining the adequate choice of each A interval. First, potential 
energy is parameterized by A, followed by the introduction of the A dependency to the non- 
bonded interactions. In order to avoid instability near the end-points of the vanishing atoms 
through calculation, the non-bonded interaction energy U(\ c , A LJ ) between the ligand and 
others, including the so-called soft-core potentials, is used 33]. We also use two kind of 
coupling constant A for representing two different potential energies: the fully bound state 
(A c = A LJ = 0) and the unbound state (A c = A LJ = 1). Hereafter, the free energy compo- 
nent resulting from turning off the electrostatic potential (A c = — > 1, A LJ = 0) is referred 
as the electrostatic contribution. We also denote the free energy component for the process 
of (A c = 1, A LJ = — > 1) as the van der Waals contribution. Next, we investigate the conver- 
gence properties of the free energy when the number of equal A spacings (= N) increases. A 
trajectory at each intermediate A« is obtained via the standard MD simulation. As already 
noted, the work associated with the forward path is obtained to compute the potential energy 
difference as Wp(i,i + 1) = U(\+i, Xj) — U(X i ,'x i ). Using a trajectory at A i+ i, the work on 
the reverse path is also obtained as W^.(i, i + 1) = U (Aj+i, Xj + i) — U(Xi, x i+ i), while the free 
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energy AG^+i can be estimated using both work distributions of Wf(i, i + and Wr,(i, 

by the AR algorithm, thus the free energy difference is AG = X^o* AG^+i. In general, AG 

is being converged as the number of N increases. N is defined as a large enough number to 



obtain the well-converged value of AG (= AG ). We also define AG(A n ) = Y^=o 
Finally, to reproduce the feature of AG (A n ), the adequate mesh points of A are determined 
to be satisfied with the following requirements: 1) the small difference between AG and 
AGq; 2) the small root mean square (rms) error between AG(A„) and AGq(A„). The rms 



error used here is defined as 



,N-1 ■ • - V2 



J2i=o I AG(\i) - AG (Aj 
Now, we have applied this manner to an RNA-theophylline system. Using the first 100 
ps MD simulation N were determined as 20 for electrostatic contribution and 40 for van 
der Waals contribution respectively. As can be seen from Table [U we have determined the 
non-uniform mesh points of A with AG difference (= 5G) < 0.05 kcal/mol and rms error 
of AG(A) < 0.1 kcal/mol, leading to a significant reduction in the number of necessary A 
points. The A points obtained were the following: 12 values of A c points (0, 0.1, 0.25, 0.45, 
0.55, 0.65, 0.7, 0.75, 0.8, 0.9, 0.95, 1) and 21 values of A LJ points (0, 0.1, 0.2, 0.275, 0.375, 
0.45, 0.55, 0.65, 0.675, 0.725, 0.75, 0.775, 0.8, 0.825, 0.85, 0.875, 0.9, 0.925, 0.95, 0.975, 1). 
Computed AG(A^) as a function of A« were also shown in Figs, Ufa) and (b). It is clear that 
the feature of the well-converged curves is reproduced well by a small number of non-uniform 
mesh points. 

To estimate the free energy difference, the path from the bound state (A= 0) to the 
unbound state (A= 1) is chosen as follows; the ligand electrostatic interactions with the 
environment are turned off, followed by its van der Waals interactions with the environment, 
whereupon we obtain the solvation energy of ligand as AGg iv (= AGg olv + AGg^ lv ). The free 
energy of the annihilation of the ligand from the ligand-receptor complex (complexation free 
energy) is also calculated as AGcompiex (= AG?com P iex + leK ). We schematically note 

as follows; L( so i) — * £( ga s) on the calculation of the solvation free energy of the ligand (L) and 
RLt so ]\ — > -R(soi) + £(gas) on the calculation of the complexation free energy of the ligand (L) to 
the receptor (R). Consequently, we obtain the absolute binding free energy of the receptor- 
ligand complex as AG = AG Comp iex - AG So i v = AGg omplex + AGc^ mplex ~ AG Soiv ~ AG Soiv 
using the relation of i2L( so i) — > R( S oi) + -k(soi)- 

In our study the three-dimensional structure of the RNA-theophylline complex proposed 
by Clore et al. 34] (PDB code 1015) was used as a modeling. We assumed that the bind- 
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ing sites for six ligands fit into the structural framework of the original RNA-theophylline 
structure because the structures of other complexes were not experimentally determined. 
All MD simulations were performed using a modified version of the GROMACS package 
(v3.1.4) |35] with single precision to speed up the execution. The Amber force field pa- 
rameters (ff99) were used to describe the RNA aptamer, including 1-4 interactions between 
hydrogen atoms, while the force field parameters for Mg 2+ ions 36] were also used from 
the Amber package. The atomic structure of the theophylline molecule was optimized in a 
vacuum using Gaussian 98 (Gaussian, Inc.) with the HF/6-31G* single-point energy calcu- 
lation, while the atomic charges were calculated using the restrained electrostatic potential 
(RESP) method Q and the ANTECHAMBER program was used to assign the GAFF for 
each atom. The chemical structures of the ligands investigated here are shown in Fig. [H 
while the electrostatic potential surface of these molecules, computed using the Adaptive 



Poisson-Boltzmann Solver (APBS) package 38J after using the RESP charge fitting are also 



shown. It is clear that hypoxanthine has a slightly different charge distribution around the 
lower left-hand side of the molecule as compared to other molecules. The TIP3P model for 
water molecules was used to describe the solvent 391 ] . 

Through this study we obeyed the following parameters for all MD simulations. Integra- 
tion of the equation of motion was performed using the leap-frog algorithm and all bonds 
were constrained by LINCS [4o(] with order 8. We used the time step of 2.0 fs while the non- 
bonded pair list of 1 nm was updated every 10 steps. All simulations were carried out at T— 



298 K using the Nose-Hoover [4l|, |42j temperature control and at 1 atm using the Berend- 
sen 43] pressure control respectively, with a time constant of 0.5 ps and a compressibility of 
4.5xl0 -5 bar -1 . The particle mesh Ewald (PME) 44( summation was used to evaluate the 
electrostatic interactions with a grid spacing of approximately 0.12 nm, a cubic spline order 
of 4 and relative tolerance between long and short range energy of 10~ 8 . The L-J interaction 
energy was calculated by a switched cutoff between 0.8 nm and 0.9 nm. We also included 
long range correction to the energy and pressure. We used the truncated octahedron box 
as a unit cell with periodic boundary conditions, while for the solvated ligand system, we 
adopted the unit cell size, maintained no closer than 0.85 nm from each face of the box, 
and introduced about 238 to 282 TIP3P water molecules into the unit cell. RNA-ligand 
complexes were prepared in which the minimum distance was 0.75 nm between the nearest 
unit cell wall on either side, while the number of TIP3P water molecules used were either 
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7192 or 7 
et alt 271 



93. We introduced 3 Mg 2+ ions into the complex according to the paper of Gouda 



46j and also added 26 Na + counter-ions to the complex to neutralize the 



system. 

One of the difficulties for computing the binding free energy is the preparation of start- 
ing structure sampled from the canonical ensemble at the desired temperature T 25j. The 
following procedure was used. We performed energy minimization using the conjugate gra- 
dient method, followed by the MD simulation of 20 ps relaxation with the solute positions 
restrained. In order to attain the thermal equilibrium state (A C =A LJ = 0), 10/15 ns MD 
calculations were typically used for the solvated ligand and the RNA-ligand complexes. 
Through this process, used to conclude the system to be equilibrated, the total potential 
energy of the system and the local electrostatic and van der Waals potential energies be- 
tween RNA and ligand were carefully monitored. Via the multi-stage method we adopted an 
atomic structure at A= after completing the preconditioning MD simulation as a starting 
structure at each intermediate A state. To decrease the uncertainty from statistical error, 
we sampled the MD simulation with 15 and 10 kinds of different initial velocities for the 
solvated ligand system and RNA-ligand complex, respectively. Nonequilibrium works ev- 
ery 50 steps (0.1 ps) were used to compute the free energies, which were obtained as the 
arithmetic average of all estimated values. Through this work we simultaneously performed 
330 (33 independent non-uniform A points x 10 samples) independent MD simulations to 
generate the energetically possible samples for evaluating the binding free energy on the 
Fujitsu BioServer system, consisting of 1920 FR-V microprocessors. 



III. RESULTS AND DISCUSSION 

It was found that a Mg 2+ ion shielded by water molecules and located at the center of 
the U-turn formed by C22-U24 {46] remained unchanged during all the preconditioning MD 
simulations. Conversely however, other Mg 2+ ions immediately left their initial positions, 
and diffused around an RNA aptamer. 
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A. Theophylline 



A 15 ns preconditioning MD simulation was used to reach a thermal stable structure 
in the RNA-theophylline complex. We now consider the energetically favorable structure 
from the perspective of both electrostatic interaction and that of van der Waals. The 
obtained structure is consistent as expected, namely, the negative surface region of an RNA 
strongly interacts with the positive region on a complementary molecule surface. Besides, 
there is a match of theophylline and residue of A28 on the substrate. This result can be 
interpreted based on the view that the van der Waals interaction simultaneously stabilizes 
the theophylline molecule onto the binding pocket. Furthermore, theophylline is bound with 
an upper as well as lower layer. 

Let us now move on to the binding free energy calculation to examine the availability 
of our method. The open circles in Fig. [3] show the features of the convergence of (a) the 
solvation free energy of theophylline and of (b) the complexation free energy of theophylline 
to an RNA as a function of the MD step. Each open circle was obtained by analyzing the 
generated works during each 100 ps MD trajectory. Rough convergence appears visible after 
200 ps and 700 ps for (a) the solvated ligand and (b) the solvated RNA-ligand complex, 
respectively. Subsequently, we used work values from 200 ps to 1 ns (8000 samples) to 
estimate the solvation energy AGg iv and those from 700 ps to 1 ns (3000 samples) to 
estimate the complexation energy AG Complex , and obtained -14.4 kcal/mol for AGs Q i v and 
-30.5 kcal/mol for AGcompiex- Consequently, we also obtained the binding free energy AG 
of -16.1 kcal/mol for the RNA-theophylline system. We have summarized the free energy 
components in Table [III and it can be found that the favorable structure of this complex is 
mainly driven by the van der Waals interaction. 

We further comment on the computational cost required to obtain the converged AG. 
Since the ratio of the simulation time between the solvation and complexation is relatively 
small, we only consider the simulation time for obtaining AGcompiex- In this system, a total 
of 330 ns MD simulations for obtaining AGcompiex were used. However, our approach is 
applicable for the use of many compute nodes to accelerate calculations, meaning the use 
of a parallel computing system, including a grid environment, is the key to diminishing the 
computation turnaround time. 
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B. Caffeine 



Caffeine differs from theophylline by a single methyl group as shown in Fig. [21 Each 
component, once decomposed from estimated binding energies, was tabulated in Table [TT1 
It is evident that the significant difference of 5.1 kcal/mol between the theophylline and 
caffeine bindings is mainly attributable to the electrostatic contribution (4.6 kcal/mol). We 
now address and attempt to explore the reason why the affinity between RNA and caffeine 
is drastically reduced in comparison with that between RNA and theophylline. Due to the 
typical snapshots, both ligands preferably bind within the pocket of an RNA, as illustrated 
in Figs. HJa) and (b). Hereafter, we discuss the results during Ins MD simulations after 
reaching thermal equilibrium states for both systems. The obtained averaged distances of 
TEP(0)-C22(1H4), TEP(hn)-C22(N3), TEP(hn)-C22(02) and TEP(nc)-U24(H3) are 0.22 
nm, 0.19 nm, 0.26 nm and 0.21 nm, respectively, while the averaged distance of CAF(nc)- 
U24(H3) of 0.21 nm was also obtained for the RNA-caffeine complex. These values indicated 
the creation of hydrogen bonds as the bond lengths were less than approximately 0.25 nm. 
This difference in the number of hydrogen bonds between both RNA-caffeine and RNA- 
theophylline complexes may result in a significant difference in the binding energy, so the 
number of hydrogen bonds for both systems were enumerated. At this point, we determined 
the hydrogen bond based on the cutoff angle of the donor-hydrogen-acceptor and the distance 
of the hydrogen-acceptor; values of 60° for cutoff angle and 0.25 nm for cutoff distance were 
used. We obtained hydrogen bonds of 4.3 (±1.0) for the solvated caffeine and the numerical 
value in parentheses indicated the standard deviation. On the other hand, the solvated 
theophylline has 5.4 (±1.1) hydrogen bonds. These results indicated a decrease in the 
hydrogen bond in solvated caffeine as compared to the solvated theophylline. The number 
of hydrogen bonds for the solvated RNA-caffeine complex were 2.8 (±0.8) and 0.9 (±0.2) for 
caffeine-water and RNA-caffeine pairs. We also counted the number of hydrogen bonds for 
the solvated RNA-theophylline complex: 2.3 (±0.7) for theophylline-water and 3.1 (±0.6) 
for RNA-theophylline. This value (~ 3) is easily understood as the hydrogen atom TEP(hn) 
are shared to be bound by both a nitrogen atom (C22(N3)) and an oxygen atom (C22(02)) 
in the solvated RNA-theophylline complex as shown in Fig. |4^a), while these hydrogen bonds 
vanish in the solvated RNA-caffeine complex in Fig. H](b). The hydrogen bond contributing 
to binding energy for the RNA-theophylline system is ~ 0, while the decrease in the number 
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of hydrogen bonds for the RNA-caffeine system is ~ 0.6. The contribution energies to 
binding free energies are roughly ~ kcal/mol for the RNA-theophylline system and ~ 3.6 
kcal/mol for the RNA-caffeine system, based on a presumed hydrogen bond energy of ~ 6 
kcal/mol. Therefore, the decrease in the hydrogen bonds represents the main reason for the 
significant difference in affinity between the RNA-theophylline and RNA-caffeine systems. 



C. Other molecules 



To obtain a thermal equilibrium binding structure, we similarly set the initial locations 
of other molecules in close proximity to the binding site of theophylline for preconditioning 
MD simulations because the electrostatic potential surface of each molecule is almost the 
same as theophylline. The obtained stable atomic configurations of other molecules bound 
with RNA were virtually unchanged. The association of RNA with all six ligands is driven 
primarily by attractive van der Waals interactions AG LJ as tabulated in Table HT1 The results 
of the absolute and relative binding free energies we obtained for six ligands are given in 
Table 1111} which also tabulates the results of other methods, TI and MM-PBSA 27J and 
includes three important characteristics: 1) There is an effective linear relation (slope= 1) 
between the computed absolute binding free energies and experimental ones. We obtained 
the following relation between the computed relative energies AAG ca i c and experimental ones 
AAG exp t for three different methods (the results of TI and MM-PBSA were from Ref. j^): 
AAG calc = AAG e xpt -0.1, R= 0.99 (This work), AAG calc = AAG expt + 0.19, R = 0.97 (TI), 
AAG calc = AAG cxp t + 1-68, i? = 0.78 (MM-PBSA), where R is the correlation coefficient. 
Consequently, the efficiencies of both our method and the TI method for estimating the 
relative binding free energies are virtually comparable. Here, we should point out that 
our method directly estimate the absolute binding energies, whereas the TI method only 
compute the relative binding energies. We also point out that the absolute binding free 
energy for the RNA-hypoxanthine system is partly shifted about 1.5 kcal/mol from the 
linear relation, which means this result might suggest consideration be taken of another 
binding site for hypoxanthine binding. 2) There is a constant energy shift of ~ -7 kcal/mol 
from the experimental values AG exp t as can be seen in Fig. [H Detailed discussions for the 
latter are given below. 

In general, since water molecules play an important role in free energy calculation, we have 
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performed calculations of AU\ oca i as the sum of short-range electrostatic interaction energy 
and van der Waals interaction energy between RNA and ligand. The sample averaged values 
during 1 ns MD simulations after reaching the thermal equilibrium state are shown in Fig. 
In comparison with Fig. our method can reproduce the trend in both 1-methylxanthine 
and xanthine well, although the large shift (~ 2 kcal/mol) from a linear relation between 
Af/iocai and AG expt was evident. Consequently, our method evidently takes the effect of 
water molecules into consideration. 



D. Constant energy shift 

The computed absolute binding free energies were approximately -7 kcal/mol smaller than 
those obtained by the experiment. How do we solve the puzzle underlying the difference 
between them? We briefly comment on the presumed reasons to solve this puzzle. First, it is 
well known that nucleic acid, e.g. RNA or DNA, is generally flexible in solution [43]. There 
are a number of metastable states in free energy space and long term residence at a single 
valley, resulting in anomalously slow dynamics. Namely, the ensemble average of several 
capable atomic structures was observed in the experiment. Thus, in the RNA-theophylline 
system we examined exploration of another metastable structure using a simulated annealing 
technique as follows: We first prepared a slightly expanded unit cell that was sufficient 
to treat the RNA relaxing. Theophylline and RNA aptamer was solvated in a truncated 
octahedron box with 3 Mg 2+ ions, 26 Na + counter-ions and 9325 TIP3P water molecules. 
Next, energy minimization and position restrained MD simulation were carried out in the 
same manner, as described in II. We also set a simulation temperature of 398 K to disturb the 
stable complex structure during the simulation time of 100 ps. Consequently, a (meta)stable 
atomic configuration was obtained by annealing to 298 K. One of the characteristic features 
was that a Mg 2+ ion leave from the center of the U-turn of C22-U24. Binding free energy of - 
3.3 kcal/ mol was obtained by AR analysis with simulation of up to 1 ns MD, by discarding the 
first 400 ps runs, which is significantly small compared with the strongly binding structure 
described in Sec. IIIA. The feature of convergence of the free energy estimation as a function 
of the MD step for this weakly binding structure was also drawn in Fig. [3] We further divide 
the binding free energy into electrostatic and van der Waals components. Binding free 
energy of -0.4 kcal/mol is contributed from the electrostatic part, whereas, -3.0 kcal/mol 
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is contributed from the van der Waals part. Second, we examined the influence of the 
atomic charges of the ligand. Here, AM1-BCC charges of the ligand generated from 
MOPAC2002 were tested. AGsdv of -17.7 kcal/mol was obtained, while AGcompiex of - 
33.7 kcal/mol was also obtained up to 1 ns MD simulations. Both values increased by ~ 
3 kcal/mol as compared to those using RESP charges; hence we obtained binding energy 
of -15.9 kcal/mol, which is very close to the value of -16.1 kcal/mol using RESP charges. 
Third, the convergence of binding energy was explored. The solvation energy AGgo lv was 
reduced by approximately 0.5 kcal/mol as the cutoff radius was increased from 0.9 nm to 
1 nm. The AGco mplox was also reduced by 0.2 kcal/mol as the cutoff radius increased from 
0.9 nm to 1 nm, meaning the correction is substantially only 0.3 kcal/mol. Fourth, we 
previously reported that ~ 3 kcal/mol energy shift from the experimental values is existed 
in FKBP12-ligand system [25| . Then, several researchers have tried to solve this issue 
by different approaches 49, 50, 5l|. Currently, we believe that this energy shift can be 



interpreted by the difference of the force field used. The energy shift vanishes when we use 
the GAFF for representing the potential of the protein to further improve the consistency. 
Therefore, we examined computing the binding energy using GAFF representing all atoms 
in the RNA-theophylline system. AGsoiv of -14.2 kcal/mol was obtained, while AGcompiex 
of -27.0 kcal/mol was also obtained, so we obtained a free energy difference AG of -12.8 
kcal/mol. This value still represents an approximate shift from the experimental values by 
4.0 kcal/mol. Finally, since an RNA has significant negative charges, we believe the fact 
that the charge polarizability of an atom is influenced by electrostatic interactions with its 
surrounding atoms, becomes important 52|, |53j, |54( . For this reason the polarizable potential 



model might be considered, however, as the computational cost rises significantly, we have 
not yet examined this ability. Consequently, this puzzle remains a major challenge in the 
RNA-ligand system. 



IV. CONCLUSION 



In conclusion, we have introduced the method (MP-CAFEE) as a highly accurate means 
of obtaining the absolute binding free energy. This method has been applied to investi- 
gate the properties of the RNA-ligand system. Since our method is suitable for the use 
of many compute nodes, we have also developed a massively parallel computing system to 
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effectively accelerate simulations. To further reduce the computational cost, we determined 
the adequate non-uniform intervals of two coupling constants and consequently obtained 
the following details: First, our estimated absolute binding free energies correlate well in 
terms of a linear fit to the experimental values when all ligands are assumed to be preferable 
to bind the same binding pocket. By comparing the results of two other methods, TI and 
MM-PBSA 27J, the accuracy of this method is almost comparable to that of TI in terms of 



relative binding free energy. With this in mind, we believe this method to be a promising 
tool to quantitatively predict the absolute binding free energy in the drug design domain. 
Second, we expect a considerable difference in the binding affinity to RNA between theo- 
phylline and caffeine to be attributed to the difference in hydrogen bonds of the related 
environment by analyzing the free energy component. Finally, we find the constant energy 
of ~ -7 kcal/mol shifted from the experimental values for all ligands on absolute binding free 
energy. Though considering several presumable solutions to this problem, this area remains 
incompletely understood. 
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TABLE I: Obtained free energy differences and rms values comparing with well-converged values 
using condensed equal A spacings, 20 uniformed points for A c , and 40 uniformed points for A LJ 
respectively. These values are deduced from the first 100 ps MD trajectory. 









solvation 


complex 






N 


5G a 


rms 


5G a 


rms 


A c 


2 


-0.14 


0.09 


0.08 


0.18 




5 


0.11 


0.09 


0.10 


0.14 




10 


0.03 


0.03 


0.06 


0.07 




ll 6 


0.03 


0.03 


-0.01 


0.03 


A LJ 


2 


-6.45 


3.73 


-10.20 


5.89 




5 


-2.69 


1.16 


-2.42 


1.19 




10 


-0.28 


0.13 


-1.48 


0.60 




20 


0.11 


0.09 


0.27 


0.13 




20 6 


0.05 


0.08 


-0.05 


0.05 



a 5G = AG — AGo, where AGq is well-converged value. 

6 These values are obtained using an adaptive A spacing set (see, in II). 



TABLE II: Electrostatic contributions and those of van der Waals, decomposed from obtained 
binding free energies for each RNA-ligand complex. 



AGsolv AGcomplex AG 



Ligand 






aqC 

Complex 


Complex 


AG C 


AG LJ 


Theophylline 


-14.2 


-0.2 


-19.2 


-11.3 


-5.0 


-11.1 


3-Methylxanthine 


-16.8 


-0.5 


-22.1 


-9.9 


-5.3 


-9.4 


Xanthine 


-19.1 


-1.0 


-24.8 


-10.0 


-5.7 


-9.0 


1-Methylxanthine 


-16.7 


-0.6 


-21.3 


-10.2 


-4.6 


-9.6 


Hypoxanthine 


-18.2 


-0.3 


-22.8 


-7.2 


-4.6 


-6.9 


Caffeine 


-12.7 


0.3 


-13.1 


-10.3 


-0.4 


-10.6 
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TABLE III: Absolute binding free energies obtained for RNA aptamer and theophylline and its 
analogs in kcal/mol. 



Experiment" TI 6 MM-PBSA 6 This work 



Ligand 


AGoxp 


AAG exp 


AAGbind 


AG^MM+solv) 


AAG( MM+solv ) AG ca i c 


AAG ca i c 


Theophylline 


-8.85 








-18.51 





-16.1 





3- Met hy lxant hine 


-7.77 


1.08 


1.36 


-15.27 


3.25 


-14.7 


1.4 


Xanthine 


-6.91 


1.94 


1.64 


-12.97 


5.54 


-14.6 


1.6 


1-Methylxanthine 


-6.88 


1.97 


1.82 


-14.24 


4.27 


-14.2 


1.9 


Hypoxanthine 


-5.87 


2.98 








-11.6 


4.5 


Caffeine 


-3.35 


5.50 


6.63 


-12.67 


5.84 


-11.0 


5.1 



"These values are obtained using the formula -7?Tln K^, where is the individual competitor dissociation 
45(. T=298 K. 



constant 
b These values are from Ref. 
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FIG. 1: Free energy convergence as a function of the number of A spacing for the RNA-Theophylline 
system; (a) AG C vs. A c . Open symbols are our reduced 12 A c points of (0, 0.1, 0.25, 0.45, 0.55, 
0.65, 0.7, 0.75, 0.8, 0.9, 0.95, 1). The solid lines are generated from 20 uniform mesh points of A . 
(b) AG LJ vs. A LJ . Open symbols are our reduced 21 A LJ points of (0, 0.1, 0.2, 0.275, 0.375, 0.45, 
0.55, 0.65, 0.675, 0.725, 0.75, 0.775, 0.8, 0.825, 0.85, 0.875, 0.9, 0.925, 0.95, 0.975, 1), while the 
solid lines are generated from 40 uniform mesh points of A LJ . 
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FIG. 2: Chemical structures of ligands investigated. The dissociation constant (in jjM) is 
included in each parenthesis. The electrostatic potential surfaces of molecules are also shown with 
red representing the negative potential and blue representing the positive potential respectively (-4 
k B T/e to 4 k B T/e). 
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FIG. 3: Free energy of theophylline-RNA aptamer as a function of MD step: (a) AGsoiv f° r solvated 
theophylline, (b) AGcomplex for the RNA-theophylline complex. Open circle denotes a strongly 
binding structure, while the open square denotes a weakly binding structure (see, in HID). Error 
bars in both graphs are drawn from the maximum value to minimum value. 
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FIG. 4: (a) Theophylline molecule and its surroundings in the RNA-theophylline complex. Only 
two residues related to hydrogen bonds are drawn and candidates for hydrogen bonds are depicted 
as dashed lines, (b) Caffeine molecule and its surroundings in the RNA-caffeine complex, while 
hydrogen bonds with residue in the same plane are also drawn in the form of a dashed line. 
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FIG. 5: Computed binding free energies AG ca i c vs. experimentally measured binding free energies 
AGexpt for six ligands. The line of slope= 1 is also drawn as a guide. The abbreviations used here 
are TEP as theophylline, TME as 3-methylxanthine, XAN as xanthine, OME as 1-methylxanthine, 
HXA as hypoxanthine and CAF as caffeine. 
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FIG. 6: Local potential energies computed AZ7] OC ai vs - experimental binding free energies AG exp t 
for six ligands. The line of slope= 1 is also drawn as a guide, while the abbreviations used are the 
same as those in Fig. 5. 
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